Scheduling File Transfers for Data-Intensive Jobs on Heterogeneous Clusters
نویسندگان
چکیده
This paper addresses the problem of efficient collective scheduling of file transfers requested by a batch of tasks. Our work targets a heterogeneous collection of storage and compute clusters. The goal is to minimize the overall time to transfer files to their respective destination nodes. Two scheduling schemes are proposed and experimentally evaluated against an existing approach, the Insertion Scheduling. The first is a 0-1 Integer Programming based approach which is based on the idea of time-expanded networks. This scheme achieves the minimum total file transfer time, but has significant scheduling overhead. To address this issue, we propose a maximum weight graph matching based heuristic approach. This scheme is able to perform as well as insertion scheduling and has much lower scheduling overhead. We conclude that the heuristic scheme is a better fit for larger workloads and systems.
منابع مشابه
Scheduling of Tasks with Batch-shared I/O on Heterogeneous Systems∗
This paper proposes a novel strategy that uses hypergraph partitioning and K-way iterative mapping-refinement heuristics for scheduling a batch of data-intensive tasks with batch-shared I/O behavior on heterogeneous collections of storage and compute clusters. The strategy formulates file sharing among tasks as a hypergraph to minimize the I/O overheads due to duplicate file transfers and emplo...
متن کاملNetwork and Data Location Aware Job Scheduling in Grid: Improvement to GridWay Metascheduler
Grid Computing has enabled us to utilize the unused computing power (CPU cycles) of computers connected to networks (e.g. Internet). Nowadays, there are lots of scientific projects going on in the domain of High Energy Physics (HEP) and Grid infrastructure constitutes the core computing facility of these projects. One such project is LHC (Large Hadron Collider) deployed at CERN. These experimen...
متن کاملHierarchical Replication Strategy for Adaptive Scoring Job Scheduling in Grid Computing
Grid technology, which together a number of personal computer clusters with high speed networks, can reach the same computing power as a supercomputer does, also with a minimum cost. However, heterogeneous system is called as grid. Scheduling independent tasks on grid is more difficult. In order to utilize the power of grid completely, we demand an efficient job scheduling algorithm to execute ...
متن کاملA New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability
Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...
متن کاملAn Improved Adaptive Space-Sharing Scheduling Policy for Non-dedicated Heterogeneous Cluster Systems
Adaptive space-sharing scheduling algorithms tend to improve the performance of clusters by allocating processors to jobs based on the current system load. The focus of existing adaptive algorithms is on dedicated homogeneous and heterogeneous clusters. However commodity clusters are naturally non-dedicated and tend to be heterogeneous over the time as cluster hardware is usually upgraded and n...
متن کامل